Figure 1. Map showing the collection locations of
470 specimens available after basic initial filtering. Colors represent
“population cluster” as assigned by tree method (used
hclust and cutree; tree length = 100km).
Hover over the dots to see what cluster they were assigned
to!
See Shalene’s paper on page 998 for inspiration on how to deal with this…Jha et al. 2015 Contemporary human-altered landscapes and oceanic barriers reduce bumble bee gene flow
## chi^2 df Pr(chi^2 >) Pr.exact
## btern01 97.66759 36 1.343170e-07 0.000
## bt28 59.31202 10 4.889876e-09 0.000
## b96 105.11921 28 7.376899e-11 0.000
## bt30 113.92536 66 2.288810e-04 0.000
## btms0081 13.89628 3 3.049781e-03 0.004
## btms0066 421.14552 276 3.999226e-08 0.000
## btms0083 1010.34228 465 0.000000e+00 0.000
## b126 313.28115 120 0.000000e+00 0.000
## btms0062 1146.60599 561 0.000000e+00 0.000
## btern02 954.22326 300 0.000000e+00 0.000
## btms0086 50.77823 3 5.454315e-11 0.000
## bl13 58.50354 3 1.227130e-12 0.000
## btms0059 112.09174 36 9.687809e-10 0.001
There is evidence that, globally, loci are out of HWE. But this is due to differences between subpopulations. Shalene handles this nicely in the paper noted above, so it’s something to report…but not really anything to worry about downstream.
LD between loci - not likely an issue here (p.rD and p.Ia > 0.05; no consistent linkage in the pairwise comparison either).
## Ia p.Ia rbarD p.rD
## 0.054012341 0.066000000 0.004580062 0.065000000
Null allele frequences are within the range typical for pop gen analysis with microsatellites and unlikely to be an issue. Can cite https://www.nature.com/articles/6800545 and write it like they do in https://peerj.com/articles/13565/ where they say, “Nevertheless, the frequency of null alleles inferred with the methods of Brookfield and Chakraborty (0.09 and 0.13, respectively) is in line with values commonly reported in the literature and is unlikely to cause a major bias in downstream population structure analyses (Dakin & Avise, 2004).”
## btern01 bt28 b96 bt30 btms0081
## Observed frequency 0.07366541 0.07133358 0.09233988 0.09316057 0.06801660
## Median frequency 0.07220547 0.06881528 0.09217080 0.09042780 0.06652242
## 2.5th percentile 0.03414292 0.03186540 0.04796458 0.05437336 0.02477958
## 97.5th percentile 0.11114992 0.11613681 0.13765518 0.13684859 0.11268061
## btms0066 btms0083 b126 btms0062 btern02
## Observed frequency 0.06941596 0.13619032 0.09291617 0.12542148 0.1480633
## Median frequency 0.06893741 0.13394953 0.09167348 0.12331952 0.1463794
## 2.5th percentile 0.03486506 0.09632793 0.05478723 0.08839799 0.1082095
## 97.5th percentile 0.10703151 0.17551051 0.13338109 0.16317304 0.1893019
## btms0086 bl13 btms0059
## Observed frequency 0.05457863 0.08153726 0.06513655
## Median frequency 0.05262534 0.08252742 0.06475930
## 2.5th percentile 0.01777369 0.03837588 0.02344977
## 97.5th percentile 0.09095684 0.13209085 0.10574397
There are no loci with <80% completeness
## named numeric(0)
There are no individuals with poor data in the dataset…but that is because I also excluded them earlier in the process. So this is just a sanity check.
## named numeric(0)
There are no duplicates or clones in the dataset. It’s possible that there are prior to the COLONY filtering, so if re-running analysis keeping siblings in the dataset, double check.
## #############################
## # Number of Individuals: 231
## # Number of MLG: 231
## #############################
## [1] 231
Just a little sanity check that all the loci are polymorphic (they are)
## Mode TRUE
## logical 13
From the summary call to the genind object, we find the
number of specimens per cluster, the number of private alleles (which is
not very informative as it’s correlated with sample size), and the
rarefied allelic richness (which may be informative). It’s quite
debatable how well the rarefied richness does when dealing with such
uneven population sizes. This is a case where re-running with different
clusters might be more effective (i.e. splitting apart the big ones and
then dropping the small ones for this analysis)
Only displaying populations with 5 or more specimens.
Figure 2. Observed (blue) versus expected (grey) heterozygosity across the regions. Numbers above bars are the sample size for each region.
Same (except ALL clusters, not just those with >=5 specimens) as figure but in table format for investigating to your heart’s content. Added some math for a little helper.
Pairwise Fst heat map. Not sure what’s up with the SE Minnesota population or Green Bay…but there’s also not that many specimens from them (8 and 7, respectively)
Below is with only populations having >=5 specimens
##
## Call:
## lm(formula = Fst ~ distance, data = df_fst_join)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.032784 -0.014806 -0.004775 0.007285 0.093444
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.704e-02 4.725e-03 3.607 0.000686 ***
## distance 3.819e-08 9.292e-09 4.110 0.000138 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0225 on 53 degrees of freedom
## Multiple R-squared: 0.2417, Adjusted R-squared: 0.2274
## F-statistic: 16.89 on 1 and 53 DF, p-value: 0.0001381
##
## Call:
## lm(formula = Fst ~ log(distance), data = df_fst_join)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.032783 -0.014908 -0.005903 0.005882 0.088357
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.178853 0.048422 -3.694 0.000524 ***
## log(distance) 0.016773 0.003846 4.361 5.99e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02217 on 53 degrees of freedom
## Multiple R-squared: 0.2641, Adjusted R-squared: 0.2502
## F-statistic: 19.02 on 1 and 53 DF, p-value: 5.994e-05
Insert summary table, case studies of MnZoo, Turtle Valley, and other sites with >15? 10? whatever
MN Zoo 2020
## # A tibble: 1 × 6
## n_genotyped n_colonies ml.colony.num CI.lower CI.upper prop_detected
## <int> <int> <int> <dbl> <dbl> <dbl>
## 1 18 9 16 9 31 0.562
MN Zoo 2021
## # A tibble: 1 × 6
## n_genotyped n_colonies ml.colony.num CI.lower CI.upper prop_detected
## <int> <int> <int> <dbl> <dbl> <dbl>
## 1 16 13 39 19.5 1000 0.333
Turtle Valley 2021
## # A tibble: 1 × 6
## n_genotyped n_colonies ml.colony.num CI.lower CI.upper prop_detected
## <int> <int> <int> <dbl> <dbl> <dbl>
## 1 42 27 66 49.5 121. 0.409
Determine numbers of males using the minimum homozygosity method, discuss implications
## # A tibble: 4 × 3
## sex is_het n
## <chr> <chr> <int>
## 1 female no 20
## 2 female yes 327
## 3 male no 52
## 4 male yes 63
## [1] 0.1615385
## # A tibble: 4 × 3
## sex is_het n
## <chr> <chr> <int>
## 1 female no 29
## 2 female yes 318
## 3 male no 89
## 4 male yes 26
## [1] 0.0755814
## # A tibble: 4 × 3
## site threshold prop_diploid
## <chr> <dbl> <dbl>
## 1 MN Zoo 1 0.175
## 2 MN Zoo 2 0.114
## 3 Appalachian 1 0.523
## 4 Appalachian 2 0.3
MNZoo 2020
## # A tibble: 3 × 3
## sex is_het n
## <chr> <chr> <int>
## 1 female yes 18
## 2 male no 8
## 3 male yes 1
## [1] 0.05263158
## # A tibble: 3 × 3
## sex is_het n
## <chr> <chr> <int>
## 1 female yes 18
## 2 male no 8
## 3 male yes 1
## [1] 0.05263158
mnzoo 2021
## # A tibble: 4 × 3
## sex is_het n
## <chr> <chr> <int>
## 1 female no 1
## 2 female yes 15
## 3 male no 3
## 4 male yes 6
## [1] 0.2857143
## # A tibble: 4 × 3
## sex is_het n
## <chr> <chr> <int>
## 1 female no 3
## 2 female yes 13
## 3 male no 6
## 4 male yes 3
## [1] 0.1875